Goto

Collaborating Authors

 Sterling County


Ancient origin of an urban underground mosquito Science

Science

Understanding how life is adapting to urban environments represents an important challenge in evolutionary biology. In this work, we investigate a widely cited example of urban adaptation, Culex pi...


No Regularization is Needed: An Efficient and Effective Model for Incomplete Label Distribution Learning

Li, Xiang, Chen, Songcan

arXiv.org Artificial Intelligence

Label Distribution Learning (LDL) assigns soft labels, a.k.a. degrees, to a sample. In reality, it is always laborious to obtain complete degrees, giving birth to the Incomplete LDL (InLDL). However, InLDL often suffers from performance degeneration. To remedy it, existing methods need one or more explicit regularizations, leading to burdensome parameter tuning and extra computation. We argue that label distribution itself may provide useful prior, when used appropriately, the InLDL problem can be solved without any explicit regularization. In this paper, we offer a rational alternative to use such a prior. Our intuition is that large degrees are likely to get more concern, the small ones are easily overlooked, whereas the missing degrees are completely neglected in InLDL. To learn an accurate label distribution, it is crucial not to ignore the small observed degrees but to give them properly large weights, while gradually increasing the weights of the missing degrees. To this end, we first define a weighted empirical risk and derive upper bounds between the expected risk and the weighted empirical risk, which reveals in principle that weighting plays an implicit regularization role. Then, by using the prior of degrees, we design a weighted scheme and verify its effectiveness. To sum up, our model has four advantages, it is 1) model selection free, as no explicit regularization is imposed; 2) with closed form solution (sub-problem) and easy-to-implement (a few lines of codes); 3) with linear computational complexity in the number of samples, thus scalable to large datasets; 4) competitive with state-of-the-arts even without any explicit regularization.


Exploiting Feature Diversity for Make-up Temporal Video Grounding

Shu, Xiujun, Wen, Wei, Guo, Taian, He, Sunan, Wu, Chen, Qiao, Ruizhi

arXiv.org Artificial Intelligence

This technical report presents the 3rd winning solution for MTVG, a new task introduced in the 4-th Person in Context (PIC) Challenge at ACM MM 2022. MTVG aims at localizing the temporal boundary of the step in an untrimmed video based on a textual description. The biggest challenge of this task is the fi ne-grained video-text semantics of make-up steps. However, current methods mainly extract video features using action-based pre-trained models. As actions are more coarse-grained than make-up steps, action-based features are not sufficient to provide fi ne-grained cues. To address this issue,we propose to achieve fi ne-grained representation via exploiting feature diversities. Specifically, we proposed a series of methods from feature extraction, network optimization, to model ensemble. As a result, we achieved 3rd place in the MTVG competition.


Think you have Solved Direct-Answer Question Answering? Try ARC-DA, the Direct-Answer AI2 Reasoning Challenge

Bhakthavatsalam, Sumithra, Khashabi, Daniel, Khot, Tushar, Mishra, Bhavana Dalvi, Richardson, Kyle, Sabharwal, Ashish, Schoenick, Carissa, Tafjord, Oyvind, Clark, Peter

arXiv.org Artificial Intelligence

We present the ARC-DA dataset, a direct-answer ("open response", "freeform") version of the ARC (AI2 Reasoning Challenge) multiple-choice dataset. While ARC has been influential in the community, its multiple-choice format is unrepresentative of real-world questions, and multiple choice formats can be particularly susceptible to artifacts. The ARC-DA dataset addresses these concerns by converting questions to direct-answer format using a combination of crowdsourcing and expert review. The resulting dataset contains 2985 questions with a total of 8436 valid answers (questions typically have more than one valid answer). ARC-DA is one of the first DA datasets of natural questions that often require reasoning, and where appropriate question decompositions are not evident from the questions themselves. We describe the conversion approach taken, appropriate evaluation metrics, and several strong models. Although high, the best scores (81% GENIE, 61.4% F1, 63.2% ROUGE-L) still leave considerable room for improvement. In addition, the dataset provides a natural setting for new research on explanation, as many questions require reasoning to construct answers. We hope the dataset spurs further advances in complex question-answering by the community. ARC-DA is available at https://allenai.org/data/arc-da